8 research outputs found
DNN-Based Multi-Frame MVDR Filtering for Single-Microphone Speech Enhancement
Multi-frame approaches for single-microphone speech enhancement, e.g., the
multi-frame minimum-variance-distortionless-response (MVDR) filter, are able to
exploit speech correlations across neighboring time frames. In contrast to
single-frame approaches such as the Wiener gain, it has been shown that
multi-frame approaches achieve a substantial noise reduction with hardly any
speech distortion, provided that an accurate estimate of the correlation
matrices and especially the speech interframe correlation vector is available.
Typical estimation procedures of the correlation matrices and the speech
interframe correlation (IFC) vector require an estimate of the speech presence
probability (SPP) in each time-frequency bin. In this paper, we propose to use
a bi-directional long short-term memory deep neural network (DNN) to estimate a
speech mask and a noise mask for each time-frequency bin, using which two
different SPP estimates are derived. Aiming at achieving a robust performance,
the DNN is trained for various noise types and signal-to-noise ratios.
Experimental results show that the multi-frame MVDR in combination with the
proposed data-driven SPP estimator yields an increased speech quality compared
to a state-of-the-art model-based estimator
Associations of Migration, Socioeconomic Position and Social Relations With Depressive Symptoms – Analyses of the German National Cohort Baseline Data
Objectives: We analyze whether the prevalence of depressive symptoms differs among various migrant and non-migrant populations in Germany and to what extent these differences can be attributed to socioeconomic position (SEP) and social relations.Methods: The German National Cohort health study (NAKO) is a prospective multicenter cohort study (N = 204,878). Migration background (assessed based on citizenship and country of birth of both participant and parents) was used as independent variable, age, sex, Social Network Index, the availability of emotional support, SEP (relative income position and educational status) and employment status were introduced as covariates and depressive symptoms (PHQ-9) as dependent variable in logistic regression models.Results: Increased odds ratios of depressive symptoms were found in all migrant subgroups compared to non-migrants and varied regarding regions of origins. Elevated odds ratios decreased when SEP and social relations were included. Attenuations varied across migrant subgroups.Conclusion: The gap in depressive symptoms can partly be attributed to SEP and social relations, with variations between migrant subgroups. The integration paradox is likely to contribute to the explanation of the results. Future studies need to consider heterogeneity among migrant subgroups whenever possible
Joint Multi-Channel Dereverberation and Noise Reduction Using a Unified Convolutional Beamformer With Sparse Priors
Recently, the convolutional weighted power minimization distortionless
response (WPD) beamformer was proposed, which unifies multi-channel weighted
prediction error dereverberation and minimum power distortionless response
beamforming. To optimize the convolutional filter, the desired speech component
is modeled with a time-varying Gaussian model, which promotes the sparsity of
the desired speech component in the short-time Fourier transform domain
compared to the noisy microphone signals. In this paper we generalize the
convolutional WPD beamformer by using an lp-norm cost function, introducing an
adjustable shape parameter which enables to control the sparsity of the desired
speech component. Experiments based on the REVERB challenge dataset show that
the proposed method outperforms the conventional convolutional WPD beamformer
in terms of objective speech quality metrics.Comment: ITG Conference on Speech Communicatio
Joint estimation of RETF vector and power spectral densities for speech enhancement based on alternating least squares
The multi-channel Wiener filter (MWF) is a well-known multi-microphone speech enhancement technique, aiming at improving the quality of the recorded speech signals in noisy and reverberant environments. Assuming that reverberation and ambient noise can be modeled as a diffuse sound field and the spatial coherence of the residual noise is known, the MWF requires estimates of the relative early transfer function (RETF) vector of the target speaker as well as the power spectral densities (PSDs) of the target, diffuse and residual noise component. RETF vector and PSD estimation is often decoupled, where one quantity is estimated independently of the other quantity. In this paper, we propose to jointly estimate the RETF vector and all PSDs by minimizing the Frobenius norm of a model-based error matrix using an alternating least squares method. Experimental results using different dynamic acoustic scenarios with a moving speaker show that the proposed method leads to a larger MWF performance than a state-of-the-art method based on covariance whitening
Speaker-conditioning Single-channel Target Speaker Extraction using Conformer-based Architectures
Target speaker extraction aims at extracting the target speaker from a
mixture of multiple speakers exploiting auxiliary information about the target
speaker. In this paper, we consider a complete time-domain target speaker
extraction system consisting of a speaker embedder network and a speaker
separator network which are jointly trained in an end-to-end learning process.
We propose two different architectures for the speaker separator network which
are based on the convolutional augmented transformer (conformer). The first
architecture uses stacks of conformer and external feed-forward blocks
(Conformer-FFN), while the second architecture uses stacks of temporal
convolutional network (TCN) and conformer blocks (TCN-Conformer). Experimental
results for 2-speaker mixtures, 3-speaker mixtures, and noisy mixtures of
2-speakers show that among the proposed separator networks, the TCN-Conformer
significantly improves the target speaker extraction performance compared to
the Conformer-FFN and a TCN-based baseline system.Comment: submitted to IWAENC 202